Overview

The gapminder data are related to a famous TED talk given by Hans Rosling. In his talk, Dr. Rosling shows an animated visualization depicting the relationship between life expectancy and average income levels by country. Our goal in this module is to reproduce Dr. Rosling’s visualization.

We will access the gapminder data from the gapminder R package. This package contains a dataset (technically, a tibble) called gapminder with 6 variables:

variable meaning
country country
continent continent
year year
lifeExp life expectancy at birth
pop total population
gdpPercap per-capita GDP

Per-capita GDP (Gross domestic product) is given in units of international dollars, “a hypothetical unit of currency that has the same purchasing power parity that the U.S. dollar had in the United States at a given point in time” – 2005, in this case.

Note: the gapminder R package exists for the purpose of teaching and making code examples. It is an excerpt of data found in specific spreadsheets on Gapminder.org circa 2010. It is not a definitive source of socioeconomic data.

Pre-requisites

Before starting these exercises, you should have a good understanding of

  1. The Data Visualization Basics Primer.

  2. Chapters 1-3 of R for Data Science

Packages

Load the tidyverse and gapminder packages. We are using tidyverse to access the ggplot2 package and using gapminder to access the data.

## Warning: package 'gapminder' was built under R version 3.6.3

Note that I am using chunk options message = FALSE, echo = TRUE because loading R packages will often produce printed output that will show up in your knitted Rmarkdown document. Saying message = FALSE suppresses printed messages, while saying echo = TRUE ensures that the code in your chunk will be printed. This is how I would like you to organize loading packages in your homework .Rmd files.

Inspect your data

In gapminder, each country has 12 rows distinguished by year.

Exercise 1

Create a scatter plot using gdpPercap as the x-variable and lifeExp as the y-variable:

Exercise 2

Modify your figure from exercise 1: transform the scale of your x-axis to be in log base 10 units. (See ?scale_x_log10)

Exercise 3

Add x- and y-axis labels to your figure from exercise 2.

Exercise 4

Add a smoothed curve to your plot, showing the overall population trend. (See ?geom_smooth)

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Exercise 5

Adjust the points in your graph:

  1. Set their shape to be 21
  2. Set their color to be 'black'
  3. Set their fill to be 'grey'

Adjust the overall population trend as well:

  1. Set the line’s color to be 'red'
  2. Remove the standard errors (shaded region around the line) from the plot.
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Exercise 6

Go to the ggplot2 theme() reference page and scroll through the pictures that show some of the built-in ggplot2 themes. Pick a theme that you like and add it to the figure you created in exercise 5.

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Exercise 7

There is something happening in the upper levels of income. The population trend between income and life expectancy changes direction. There is an R package called plotly that can help you explore ggplot figures interactively. Converting a ggplot2 figure into a plotly figure is straightforward:

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

You can tell by hovering your mouse over the far right points in the figure that the higher income but lower life expectancy country is Kuwait. Now, re-create this figure, but use year as a label instead of country, and identify the years that account for these points.

Once you’ve seen the year values associated with the points in the upper-income but lower than expected life expectancy, formulate a hypothesis explaining your data. After you’ve written your hypothesis down, go to Wikipedia’s Kuwait page and read about their modern history. Was your hypothesis correct?


  1. University of Alabama at Birmingham,